Understanding the Performance of Statistical MT Systems: A Linear Regression Framework
نویسندگان
چکیده
We present a framework for the analysis of Machine Translation performance. We use multivariate linear models to determine the impact of a wide range of features on translation performance. Our assumption is that variables that most contribute to predict translation performance are the key to understand the differences between good and bad translations. During training, we learn the regression parameters that better predict translation quality using a wide range of input features based on the translation model and the first-best translation hypotheses. We use a linear regression with regularization. Our results indicate that with regularized linear regression, we can achieve higher levels of correlation between our predicted values and the actual values of the quality metrics. Our analysis shows that the performance for in-domain data is largely dependent on the characteristics of the translation model. On the other hand, out-of domain data can benefit from better reordering strategies. TITLE AND ABSTRACT IN ANOTHER LANGUAGE Modelos Lineales para el Análisis del Desempeño de la Traducción Automática En este documento presentamos una metodología para el análisis del desempeño de los sistemas de traducción automática. Utilizamos modelos lineales multivariados para determinar el impacto que diversas variables tienen en la calidad de las traducciones. En este estudio se asume que las variables que más contribuyen a predecir la calidad de las traducciones, son determinantes para entender las diferencias entre buenas y malas traducciones. Nuestros resultados demuestran que usando regresión lineal penalizada, se pueden obtener altos índices de predicción de calidad de traducción. Un análisis detallado revela que el desempeño de los sistemas de traducción frente a datos in-domain dependen en gran medida de las características de nuestros modelos de traducción. En contraste, la traducción de documentos out-of-domain está fuertemente ligada a las estrategias de reodenamiento que se utilicen.
منابع مشابه
Prediction of the waste stabilization pond performance using linear multiple regression and multi-layer perceptron neural network: a case study of Birjand, Iran
Background: Data mining (DM) is an approach used in extracting valuable information from environmental processes. This research depicts a DM approach used in extracting some information from influent and effluent wastewater characteristic data of a waste stabilization pond (WSP) in Birjand, a city in Eastern Iran. Methods: Multiple regression (MR) and neural network (NN) models were examined u...
متن کاملA statistical analysis framework for bus reliability evaluation based on AVL data: A case study of Qazvin, Iran
Reliability is a fundamental factor in the operation of bus transportation systems for the reason that it signifies a straight indicator of the quality of service and operator’s costs. Todays, the application of GPS technology in bus systems provides big data availability, though it brings the difficulties of data preprocessing in a methodical approach. In this study, the principal component an...
متن کاملApplication of Linear Regression and Artificial NeuralNetwork for Broiler Chicken Growth Performance Prediction
This study was conducted to investigate the prediction of growth performance using linear regression and artificial neural network (ANN) in broiler chicken. Artificial neural networks (ANNs) are powerful tools for modeling systems in a wide range of applications. The ANN model with a back propagation algorithm successfully learned the relationship between the inputs of metabolizable energy (kca...
متن کاملNew Approach in Fitting Linear Regression Models with the Aim of Improving Accuracy and Power
The main contribution of this work lies in challenging the common practice of inferential statistics in the realm of simple linear regression for attaining a higher degree of accuracy when multiple observations are available, at least, at one level of the regressor variable. We derive sufficient conditions under which one can improve the accuracy of the interval estimations at quite affordable ...
متن کاملPhase II monitoring of multivariate simple linear profiles with estimated parameters
In some applications of statistical process monitoring, a quality characteristic can be characterized by linear regression relationships between several response variables and one explanatory variable, which is referred to as a “multivariate simple linear profile.” It is usually assumed that the process parameters are known in Phase II. However, in most applications, this assumption is viola...
متن کامل